66 research outputs found

    How to Address the Data Quality Issues in Regression Models: A Guided Process for Data Cleaning

    Get PDF
    Today, data availability has gone from scarce to superabundant. Technologies like IoT, trends in social media and the capabilities of smart-phones are producing and digitizing lots of data that was previously unavailable. This massive increase of data creates opportunities to gain new business models, but also demands new techniques and methods of data quality in knowledge discovery, especially when the data comes from different sources (e.g., sensors, social networks, cameras, etc.). The data quality process of the data set proposes conclusions about the information they contain. This is increasingly done with the aid of data cleaning approaches. Therefore, guaranteeing a high data quality is considered as the primary goal of the data scientist. In this paper, we propose a process for data cleaning in regression models (DC-RM). The proposed data cleaning process is evaluated through a real datasets coming from the UCI Repository of Machine Learning Databases. With the aim of assessing the data cleaning process, the dataset that is cleaned by DC-RM was used to train the same regression models proposed by the authors of UCI datasets. The results achieved by the trained models with the dataset produced by DC-RM are better than or equal to that presented by the datasets' authors.This work has been also supported by the Spanish Ministry of Economy, Industry and Competitiveness (Projects TRA2015-63708-R and TRA2016-78886-C3-1-R)

    Aprendizaje automático en conjuntos de clasificadores heterogéneos y modelado de agentes

    Get PDF
    Una de las áreas que mas auge ha tenido en los últimos años dentro del aprendizaje automático es aquella en donde se combinan las decisiones de clasificadores individuales con la finalidad de que la decisión final de a que clase pertenece un ejemplo sea realizada por un conjunto de clasificadores. Existen diversas técnicas para generar conjuntos de clasificadores, desde la manipulación de los datos de entrada a la utilización de meta-aprendizaje. Una de las maneras en las que se clasifican estas técnicas es por el numero de algoritmos de aprendizaje diferentes que utilizan con el fin de generar los miembros del conjunto. Aquellas técnicas que utilizan un único algoritmo para generar todos los miembros del conjunto se dice que generan un conjunto homogéneo. Por otra parte, aquellas técnicas que utilizan mas de un algoritmo para generar los clasificadores se considera que generan un conjunto de clasificadores heterogéneo. Entre los algoritmos de generación de conjuntos heterogéneos se encuentra Stacking, el cual, además de generar los clasificadores del conjunto a partir de distintos algoritmos de aprendizaje, utiliza dos niveles de aprendizaje. El primer nivel de aprendizaje o nivel-0 utiliza los datos del dominio de manera directa, mientras que el meta-nivel o nivel-1 utiliza datos generados a partir de los clasificadores del nivel-0. Un problema inherente a Stacking es determinar la configuración de los parámetros de aprendizaje del algoritmo, entre ellos, qué y cuántos algoritmos deben ser utilizados en la generación de los clasificadores del conjunto. Trabajo previos han determinado que no hay un numero exacto de algoritmos a utilizar que sea el optimo para todos los dominios. Tampoco está perfectamente definido qué algoritmos se deberían utilizar, aunque existen trabajos que utilizan algoritmos representativos de cada tipo. Uno de los objetivos de esta tesis doctoral es la utilización de algoritmos genéticos como técnica de optimización para determinar los algoritmos que deben ser utilizados para generar el conjunto de clasificadores, al igual que la configuración de los parámetros de aprendizaje de estos. De esta manera el método que se propone es independiente del dominio, mientras que la configuración de los parámetros de Stacking encontrada, dependería del dominio. El crecimiento del comercio electrónico y las aplicaciones en la World-Wide- Web ha motivado el incremento de los entornos en donde intervienen agentes. Estos entornos incluyen situaciones competitivas y/o colaborativas en donde el conocimiento que se posea sobre los individuos involucrados en el entorno, proporciona II III una clara ventaja a la hora de tomar una decisión sobre qué acción llevar a cabo. Existen diversas formas de adquirir este conocimiento. Una de ellas es a través del modelado del comportamiento de los agentes. A su vez, existen diversas formas de construir el modelo de un agente. Algunas técnicas utilizan modelos previamente construidos y su objetivo es intentar emparejar el comportamiento observado con un modelo existente. Otras técnicas asumen un comportamiento optimo del agente a modelar con el fin de crear un modelo de su comportamiento. Un segundo objetivo de esta tesis doctoral es la creación de un marco general para el modelado de agentes basándose en la observación del comportamiento del agente a modelar. Para ello se propone la utilización de técnicas de aprendizaje automático con el propósito de llevar a cabo la tarea de modelado basándose en la relación existente entre la entrada y la salida del agente.____________________________________________ In the last years, one of the most active research areas in Machine Learning is that of ensembles of classifiers. Their purpose is to combine the decisions of individual classifiers so that all classifiers in the ensemble are taken into account in order to classify new instances. There are many techniques that generate such ensembles. Some manipulate the input data, while others use meta-learning. In general, ensembles can be homogeneous or heterogeneous. Homogeneous ensembles consist of several classifiers generated by the same learning technique, whereas heterogeneous ensembles contain classifiers generated by different algorithms. A well-known approach to generate heterogeneous ensembles is Stacking. Stacking uses two levels of learning. The first learning level or level-0 uses direct data from the domain, whereas the meta-level or level-1 uses data generated by classifiers from level-0. An inherent problem to Stacking is to determine the right configuration of the learning parameters, like how many classifiers, and which learning algorithms, must be used in the generation of the ensemble of classifiers. Previous work have shown that there is no optimal decision for all the domains, although there are works that use representative algorithms from each type. One goal of this thesis is to use Genetic Algorithms as an optimization technique in order to determine the type and number of algorithms to be used to generate the ensemble of classifiers, as well as the configuration of the learning parameters of these algorithms. The proposed method is domain independent, and the Genetic Algorithm will be able to adapt to particular domains. The growth of the e-commerce and applications over the World-Wide-Web has motivated the increase of environments where agents can interact. These environment include competitive and/or colaborative situations where the knowledge about other individuals involved in the environment, provides a clear advantage when making decision about actions to perform. There are several ways to acquire this knowledge. One of them is by modeling the behavior of other agents. There are several ways to construct an agent’s model. Some techniques use previously constructed models and its goal to match the observed behavior with an existing model. Other techniques assume that the agent to model carries out an optimal strategy in order to create a model of its behavior. In this thesis, a second approach to model agents will be used based on the observation of other agents behavior. In order to do this, a general framework that uses machine learning techniques for agent modeling is proposed

    Iot application for energy poverty detection based on thermal comfort monitoring

    Get PDF
    The development of a datalogger for identifying Energy Poverty (EP) using thermal comfort monitoring is described in this work. There is not a uniform definition of EP, and no global recommendations indicating the thermal comfort characteristics that should be utilized to identify EP. Most Internet of Things (IoT)-based systems designed for EP identification measure energy consumptions (electricity and gas). There is a lack of works that use IoT-based systems to identify EP through the monitoring of thermal comfort parameters. To address the deficiencies discovered in the identification of EP from the perspective of thermal efficiency, an IoT-based monitoring system was designed, developed, and tested. A first pilot was installed in a household in Getafe. A full month of temperature, relative humidity, and CO2 concentration measurements were utilized to evaluate the system, which was then compared to a commercial system. The results revealed that the new IoT-based approach was very dependable and may be used to accurately monitor EP-related parameters.This work was supported by the European Commission through Urban Innovative Actions of the EPIU Getafe Project under Grant UIA04-212. The work of Dr. Agapito Ledezma was supported by the Agencia Estatal de Investigación (AEI) under Grant PID2021-124335OB-C22

    From continous behaviour to discrete knowledge

    Get PDF
    Proceeding of: 7th InternationalWork-Conference on Artificial and Natural Neural Networks, IWANN 2003, Maó, Menorca, Spain, June 3-6, 2003, Proceedings, Part IINeural networks have proven to be very powerful techniques for solving a wide range of tasks. However, the learned concepts are unreadable for humans. Some works try to obtain symbolic models from the networks, once these networks have been trained, allowing to understand the model by means of decision trees or rules that are closer to human understanding. The main problem of this approach is that neural networks output a continuous range of values, so even though a symbolic technique could be used to work with continuous classes, this output would still be hard to understand for humans. In this work, we present a system that is able to model a neural network behaviour by discretizing its outputs with a vector quantization approach, allowing to apply the symbolic method

    Heuristic search-based stacking of classifiers

    Get PDF
    Currently, the combination of several classifiers is one of the most activefields within inductive learning. Examples of such techniques are boost-ing, bagging and stacking. From these three techniques, stacking isperhaps the least used one. One of the main reasons for this relates to thedifficulty to define and parameterize its components: selecting whichcombination of base classifiers to use, and which classifiers to use as themeta-classifier. The approach we present in this chapter poses thisproblem as an optimization task, and then uses optimization techniquesbased on heuristic search to solve it. In particular, we apply geneticalgorithms to automatically obtain the ideal combination of learningmethods for the stacking system

    On the practical nature of artificial qualia

    Get PDF
    Proceeding of: 2010 Annual Convention of the Society for the Study of Artificial Intelligence and Simulation of Behaviour (AISB 2010), Leicester, UK, 29 March - 1 April, 2010.Can machines ever have qualia? Can we build robots with inner worlds of subjective experience? Will qualia experienced by robots be comparable to subjective human experience? Is the young field of Machine Consciousness (MC) ready to answer these questions? In this paper, rather than trying to answer these questions directly, we argue that a formal definition, or at least a functional characterization, of artificial qualia is required in order to establish valid engineering principles for synthetic phenomenology (SP). Understanding what might be the differences, if any, between natural and artificial qualia is one of the first questions to be answered. Furthermore, if an interim and less ambitious definition of artificial qualia can be outlined, the corresponding model can be implemented and used to shed some light on the very nature of consciousness.1In this work we explore current trends in MC and SP from the perspective of artificial qualia, attempting to identify key features that could contribute to a practical characterization of this concept. We focus specifically on potential implementations of artificial qualia as a means to provide a new interdisciplinary tool for research on natural and artificial cognition.This work was supported in part by the Spanish Ministry of Education under CICYT grant TRA2007-67374-C02-02.Publicad

    Criteria for consciousness in artificial intelligent agents

    Get PDF
    Proceeding of: Adaptive Learning Agents and Multi-Agent Systems, ALAMAS+ALAg 2008 – Workshop at AAMAS 2008, Estoril, May, 12, 2008, Portugal.Accurately testing for consciousness is still an unsolved problem when applied to humans and other mammals. The inherent subjective nature of conscious experience makes it virtually unreachable to classic empirical approaches. Therefore, alternative strategies based on behavior analysis and neurobiological studies are being developed in order to determine the level of consciousness of biological organisms. However, these methods cannot be directly applied to artificial systems. In this paper we propose both a taxonomy and some functional criteria that can be used to assess the level of consciousness of an artificial intelligent agent. Furthermore, a list of measurable levels of artificial consciousness, ConsScale, is defined as a tool to determine the potential level of consciousness of an agent. Both the mapping of consciousness to AI and the role of consciousness in cognition are controversial and unsolved questions, in this paper we aim to approach these issues with the notions of I-Consciousness and embodied intelligence.This research has been supported by the Spanish Ministry of Education and Science under project TRA2007-67374-C02-02.Publicad

    Strategies for measuring machine consciousness

    Get PDF
    The accurate measurement of the level of consciousness of a creature remains a major scientific challenge, nevertheless a number of new accounts that attempt to address this problem have been proposed recently. In this paper we analyze the principles of these new measures of consciousness along with other classical approaches focusing on their applicability to Machine Consciousness (MC). Furthermore, we propose a set of requirements of what we think a suitable measure for MC should be, discussing the associated theoretical and practical issues. Using the proposed requirements as a framework for the design of an integrative measure of consciousness, we explore the possibility of designing such a measure in the context of current state of the art in consciousness studies.This work has been supported by the Grant CICYTTRA-2007-67374-C02-02

    Towards the generation of visual qualia in artificial cognitive architectures

    Get PDF
    Proceeding of: Brain Inspired Cognitive Systems (BICS 2010). Madrid, Spain, 14-16 July, 2010.The nature and the generation of qualia in machines is a highly controversial issue. Even the existence of such a concept in the realm of artificial systems is often neglected or denied. In this work, we adopt a pragmatic approach to this problem using the Synthetic Phenomenology perspective. Specifically, we explore the generation of visual qualia in an artificial cognitive architecture inspired on the Global Workspace Theory (GWT). We argue that preliminary results obtained as part of this research line will help to characterize and identify artificial qualia as the direct products of conscious perception in machines. Additionally, we provide a computational model for integrated covert and overt perception in the framework of the GWT. A simple form of the apparent motion effect is used as a preliminary experimental context and a practical case study for the generation of synthetic visual experience. Thanks to an internal inspection subsystem, we are able to analyze both covert and overt percepts generated by our system when confronted with visual stimuli. The inspection of the internal states generated within the cognitive architecture enable us to discuss possible analogies with human cognition processes.This work was supported in part by the Spanish Ministry of Education under CICYT grant TRA2007-67374-C02-02.Publicad

    ConsScale: a plausible test for machine consciousness?

    Get PDF
    Proceeding of: the Nokia Workshop on Machine Consciousness, (in 13th Finnish Artificial Intelligence Conference, STeP 2008), Helsinki, Finland, August 21-22, 2008.Is consciousness a binary on/off property? Or is it on the contrary a complex phenomenon that can be present in different states, qualities, and degrees? We support the latter and propose a linear incremental scale for consciousness applicable to artificial agents. ConsScale is a novel agent taxonomy intended to classify agents according to their level of consciousness. Even though testing for consciousness remains an open question in the domain of biological organisms, a review of current biological approaches is discussed as well as their possible adapted application into the realm of artificial agents. Regarding to the always controversial problem of phenomenology, in this work we have adopted a purely functional approach, in which we have defined a set of architectural and behavioral criteria for each level of consciousness. Thanks to this functional definition of the levels, we aim to specify a set of tests that can be used to unambiguously determine the higher level of consciousness present in the artificial agent under study. Additionally, since a number of objections can be presumably posed against our proposal, we have considered the most obvious critiques and tried to offer reasonable rebuttals to them. Having neglected the phenomenological dimension of consciousness, our proposal might be considered reductionist and incomplete. However, we believe our account provides a valuable tool for assessing the level of consciousness of an agent at least from a cognitive point of view.This research has been also supported by the Spanish Ministry of Education and Science under CICYT grant TRA2007-67374-C02-02.Publicad
    • …
    corecore